SenseClusters - Finding Clusters that Represent Word Senses
نویسندگان
چکیده
SenseClusters is a freely available word sense discrimination system that takes a purely unsupervised clustering approach. It uses no knowledge other than what is available in a raw unstructured corpus, and clusters instances of a given target word based only on their mutual contextual similarities. It is a complete system that provides support for feature selection from large corpora, several different context representation schemes, various clustering algorithms, and evaluation of the discovered clusters.
منابع مشابه
Automatic Cluster Stopping with Criterion Functions and the Gap Statistic
SenseClusters is a freely available system that clusters similar contexts. It can be applied to a wide range of problems, although here we focus on word sense and name discrimination. It supports several different measures for automatically determining the number of clusters in which a collection of contexts should be grouped. These can be used to discover the number of senses in which a word i...
متن کاملAn Unsupervised Vector Approach to Biomedical Term Disambiguation: Integrating UMLS and Medline
This paper introduces an unsupervised vector approach to disambiguate words in biomedical text that can be applied to all-word disambiguation. We explore using contextual information from the Unified Medical Language System (UMLS) to describe the possible senses of a word. We experiment with automatically creating individualized stoplists to help reduce the noise in our dataset. We compare our ...
متن کاملIdentifying Similar Words and Contexts in Natural Language with SenseClusters
SenseClusters is a freely available intelligent system that clusters together similar contexts in natural language text. Thereafter it assigns identifying labels to these clusters based on their content. It is a purely unsupervised approach that is language independent, and uses no knowledge other than what is available in raw un-annotated corpora. In addition to clustering similar contexts, it...
متن کاملUMND2 : SenseClusters Applied to the Sense Induction Task of Senseval-4
SenseClusters is a freely–available open– source system that served as the University of Minnesota, Duluth entry in the SENSEVAL-4 sense induction task. For this task SenseClusters was configured to construct representations of the instances to be clustered using the centroid of word cooccurrence vectors that replace the words in an instance. These instances are then clustered using k–means whe...
متن کاملDuluth-WSI: SenseClusters Applied to the Sense Induction Task of SemEval-2
The Duluth-WSI systems in SemEval-2 built word co–occurrence matrices from the task test data to create a second order co–occurrence representation of those test instances. The senses of words were induced by clustering these instances, where the number of clusters was automatically predicted. The Duluth-Mix system was a variation of WSI that used the combination of training and test data to cr...
متن کامل